A novel way of computing similarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes

نویسندگان

  • Francois Fouss
  • Alain Pirotte
  • Jean-Michel Renders
  • Marco Saerens
چکیده

This work presents a new perspective on characterizing the similaritybetween elements of a database or, more generally, nodes of a weightedand undirected graph. It is based on a Markov-chain model of randomwalk through the database. More precisely, we compute quantities (theaverage commute time, the pseudoinverse of the Laplacian ma-trix of the graph, etc) that provide similarities between any pair of nodes,having the nice property of increasing when the number of paths connect-ing those elements increases and when the “length” of paths decreases.It turns out that the square root of the average commute time is a Eu-clidean distance and that the pseudoinverse of the Laplacian matrix is akernel (its elements are inner products closely related to commute times).A procedure for computing the subspace projection of the node vectorsof the graph that preserves as much variance as possible in terms of thecommute-time distance – a principal component analysis (PCA) of thegraph – is also introduced. This graph PCA provides a nice interpreta-tion to the “Fiedler vector”, widely used for graph partitioning. Themodel is evaluated on a collaborative-recommendation task where sug-gestions are made about which movies people should watch based uponwhat they watched in the past. Experimental results on the MovieLensdatabase show that the Laplacian-based similarities (the pseudoinverse ofthe Laplacian matrix and the “random-forest matrix”) perform well incomparison with other methods. The model, which nicely fits into theso-called “statistical relational learning” framework, could also be usedto compute document or word similarities, and, more generally, it couldbe applied to machine-learning and pattern-recognition tasks involving adatabase.François Fouss, Alain Pirotte and Marco Saerens are with the Information Systems Re-search Unit (ISYS), IAG, Université catholique de Louvain, Place des Doyens 1, B-1348Louvain-la-Neuve, Belgium. Email: {saerens, pirotte, fouss}@isys.ucl.ac.be.Jean-Michel Renders is with the Xerox Research Center Europe, Chemin de Maupertuis6, 38240 Meylan (Grenoble), France. Email: [email protected].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel way of computing dissimilarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes

This work presents a new perspective on characterizing the similarity be-tween elements of a database or, more generally, nodes of a weighted, undi-rected, graph. It is based on a Markov-chain model of random walk throughthe database. More precisely, we compute quantities (the average commutetime, the pseudoinverse of the Laplacian matrix of the graph, etc) thatprovide simil...

متن کامل

LPKP: location-based probabilistic key pre-distribution scheme for large-scale wireless sensor networks using graph coloring

Communication security of wireless sensor networks is achieved using cryptographic keys assigned to the nodes. Due to resource constraints in such networks, random key pre-distribution schemes are of high interest. Although in most of these schemes no location information is considered, there are scenarios that location information can be obtained by nodes after their deployment. In this paper,...

متن کامل

The topological ordering of covering nodes

The topological ordering algorithm sorts nodes of a directed graph such that the order of the tail of each arc is lower than the order of its head. In this paper, we introduce the notion of covering between nodes of a directed graph. Then, we apply the topological orderingalgorithm on graphs containing the covering nodes. We show that there exists a cut set withforward arcs in these...

متن کامل

A Stock Market Filtering Model Based on Minimum Spanning Tree in Financial Networks

There have been several efforts in the literature to extract as much information as possible from the financial networks. Most of the research has been concerned about the hierarchical structures, clustering, topology and also the behavior of the market network; but not a notable work on the network filtration exists. This paper proposes a stock market filtering model using the correlation - ba...

متن کامل

Sampling from social networks’s graph based on topological properties and bee colony algorithm

In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006